Chemistry-specific Features and Heuristics for Developing a CRF-based Chemical Named Entity Recogniser
نویسندگان
چکیده
We describe and compare methods developed for the BioCreative IV chemical compound and drug name recognition (CHEMDNER) task. The presented conditional random fields (CRF)-based named entity recogniser employs a statistical model trained on domain-specific features, in addition to those typically used in biomedical NERs. In order to increase recall, two heuristics-based post-processing steps were introduced, namely, abbreviation recognition and re-labelling based on a token’s chemical segment composition. The chemical NER was used to generate predictions for both the Chemical Entity Mention recognition (CEM) and Chemical Document Indexing (CDI) subtasks of the challenge. Results obtained from training a model on the provided training set and testing on the development set show that employing chemistryspecific features and heuristics leads to an increase in performance in both subtasks.
منابع مشابه
A Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features
Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...
متن کاملOptimising chemical named entity recognition with pre-processing analytics, knowledge-rich features and heuristics
BACKGROUND The development of robust methods for chemical named entity recognition, a challenging natural language processing task, was previously hindered by the lack of publicly available, large-scale, gold standard corpora. The recent public release of a large chemical entity-annotated corpus as a resource for the CHEMDNER track of the Fourth BioCreative Challenge Evaluation (BioCreative IV)...
متن کاملAdapting ChER for the recognition of chemical mentions in patents
ChER (Chemical Entity Recogniser) is a pipeline of natural language processing tools optimised for the recognition of chemical names in scientific abstracts. It formed the basis of our submissions to the previous edition of the CHEMDNER track in BioCreative IV, and was one of the top-performing systems both for the chemical document indexing (CDI) and chemical entity mention recognition (CEM) s...
متن کاملNEROC: Named Entity Recognizer of Chemicals
We describe a pipeline system, Named Entity Recognizer of Chemicals (NEROC), that aims to identify chemical entities mentioned in free texts. The system is based on a machine learning approach, a Conditional Random Field (CRF), and a selection of feature sets that are used to capture specific characteristics of chemical named entities. In this paper, we report results that produced by CRF model...
متن کاملAggregating Machine Learning and Rule Based Heuristics for Named Entity Recognition
This paper, submitted as an entry for the NERSSEAL-2008 shared task, describes a system build for Named Entity Recognition for South and South East Asian Languages. Our paper combines machine learning techniques with language specific heuristics to model the problem of NER for Indian languages. The system has been tested on five languages: Telugu, Hindi, Bengali, Urdu and Oriya. It uses CRF (Co...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013